Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Add modeling service for abuse prevention #551

Merged
merged 9 commits into from
Sep 17, 2020
Merged

Conversation

sethvargo
Copy link
Member

Part of GH-534

Release Note

Add modeling service for abuse detection (and prevention in the future)

@googlebot googlebot added the cla: yes Auto: added by CLA bot when all committers have signed a CLA. label Sep 16, 2020
@google-cla google-cla bot added the cla: yes Auto: added by CLA bot when all committers have signed a CLA. label Sep 16, 2020
@sethvargo sethvargo force-pushed the sethvargo/modeler branch 5 times, most recently from c699155 to 2556985 Compare September 16, 2020 18:58
Copy link
Contributor

@jeremyfaller jeremyfaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments, feel free to ignore in the name of expeditiousness.

pkg/controller/modeler/modeler.go Show resolved Hide resolved
pkg/controller/modeler/modeler.go Outdated Show resolved Hide resolved
pkg/controller/modeler/modeler_test.go Show resolved Hide resolved

// This is probably overkill, but it enables us to pick a different curve in
// the future, if we want.
degree := 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really only a comment as your degree is quite low, and probably not relevant because I believe this just ends up being a linear regression right now, but higher order polynomials could put you in a world where you're sensitive to things like weekends skewing your data, and among other things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent most of yesterday playing with models. Choosing a hire degree polynomial gives us a much higher r2 value, but it causes significantly skews in the data given how small the set is.

But yea, when I did an 11th degree polynomial, it perfectly predicted all values within 5 of the curve, but then the "next" value was 14 orders of magnitude higher.

// be over at 00:00 UTC, and we don't want to generate a partial model.
ys = ys[:len(ys)-1]

// Reverse the list - it came in reversed because we sorted by date DESC, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanna just ask for the results ascending above? I am not enough of an SQL hacker to know if there's a query that does what you want there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're building the regression model based on the past 21 days of data, so we need the SQL query to be "the last 21 records" (order by date DESC). I don't think there's a SQL way to ORDER BY and take the last elements.

pkg/controller/modeler/modeler.go Outdated Show resolved Hide resolved
cmd/modeler/main.go Show resolved Hide resolved
cmd/server/assets/realm.html Show resolved Hide resolved
pkg/controller/modeler/modeler.go Show resolved Hide resolved
pkg/controller/modeler/modeler.go Outdated Show resolved Hide resolved
pkg/controller/modeler/modeler.go Show resolved Hide resolved

// Require some reasonable number of days of history before attempting to
// build a model.
if l := len(ys); l < 14 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are gaps?

like we actually have 14 days of data, but some of those days are 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As implemented, they are skipped. There wouldn't be a corresponding date, so there'd be no corresponding zero.

The original query I wrote actually took that into account, using generate_series and a cross join. However, a significant spike-drop-spike severely throws off the model (e.g. 100, 0, 80), so I'd rather just exclude zeros for now during modeling.

@sethvargo
Copy link
Member Author

@mikehelmick updated PTAnotherL

@icco
Copy link
Contributor

icco commented Sep 17, 2020

Merge conflicts.

@sethvargo
Copy link
Member Author

Rebased @icco

@icco
Copy link
Contributor

icco commented Sep 17, 2020

/approve
/lgtm

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: icco, sethvargo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot google-oss-robot merged commit a802c5f into main Sep 17, 2020
@google-oss-robot google-oss-robot deleted the sethvargo/modeler branch September 17, 2020 17:44
@google google locked and limited conversation to collaborators Oct 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Auto: added by CLA bot when all committers have signed a CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants